# Design and Implementation of Multi Operand Binary Tree Adder Using Verilog HDL

<sup>1</sup>N.C. CHANDU PRASANTH,

<sup>1</sup>ASSISTANT PROFESSOR, <sup>1</sup>Department of ECE, <sup>1</sup>St. Peter's Engineering College, Hyderabad, Telangana <sup>1</sup>chanduprasanthkvl@gmail.com

**Abstract**— The critical path of ripple carry adder based binary tree adder is analysed to find the possibilities for delay minimization. Based on the findings of the analysis the new logic formulation and the corresponding design of RCA are proposed for the BTA. The comparison results show that the proposed RCA design offers better efficiency in terms of area, delay and energy than the existing RCA. Using this RCA design, the BTA structure is proposed. The synthesis results reveal that proposed 32-operand BTA provides the saving of 22.5% in area-delay product and in energy delay product over the recent Wallace tree adder which is the best among available multi-operand adders. The authors have also applied the proposed BTA in the recent multiplier designs to evaluate its performance. The synthesis result shows that the performance of multiplier design improved significantly due to use of proposed BTA. Therefore, the proposed BTA design can be a better choice to develop the area, delay and energy efficient digital systems for signal and image processing applications.

### Key Words — Binary Tree Adder, Ripple Carry Adder, Verilog.

### I. INTRODUCTION

Adders are the basic and the essential computational blocks in most digital logic and adiabatic logic-based circuits embedded in most of the real-time application systems in the electronics and communication field. Therefore, an area-efficient adder is designed here to design a low-power-consuming course. Nowadays, most of these systems are incorporating the VLSI integrated circuits. This is based on the consumer's choice to have miniaturized, low-power-consuming application circuits. We know that area reduction by reducing the number of transistors in a particular logic style while designing the full adder cells leads to a considerable decrease in overall power consumption. Concentrating on leakage power reduction is necessary to achieve a substantial reduction in static power consumption. We need to consider efficient power gating techniques to attain longer battery life on miniaturized, portable, and handheld systems. The decline in technology sizes during I.C. fabrication also leads to improved leakage current. Considering these concepts, in this research work, an efficient power gated adiabatic logic-based full adder cell is designed, which has less static and dynamic power consumption. This energy- efficient adder is incorporated inside the carry select adder. Thus, the designed carry select adder has very less static power.

### **II. LITERATURE SURVEY**

# Efstathiou, C., Owda, Z., Tsiatouhas, Y.: 'New high-speed multioutput carry look ahead adders', IEEE Trans. Circuits Syst. II, Express Briefs, 2013, 60, (10), pp. 667–671

With the advancements in the semiconductor industry, designing a high-performance processor is a prime concern. Multiplier is one of the most crucial parts in almost every digital signal processing application. This paper addresses the implementation of an 8- bit multiplier design employing CMOS full adder, full adder using Double Pass Transistor (DPL) and multi output carry Look ahead logic (CLA). DPL adder avoids the noise margin problem and speed degradation at low value of supply

voltages associated with complementary pass transistor (CPL) logic circuits. Multi output carry look ahead adder leads to significant improvement in the speed of the overall circuitry. The investigation is carried out with simulation runs on HSPICE environment using 90 nm process technology at 25 °C. Finally, the design guidelines are derived to select the most suitable topology for the desired applications. Investigation reveals that multiplier design using multi output carry look ahead adder proves to be more speed efficient in comparison with the other two considered design strategies.

# A static low-power, high performance 32-bit carry skip adder'. Euromicro Symp. on Digital System Design, 2004, DSD 2004, Rennes, France, 2004, pp. 615–619

The design of a 32-bit carry-skip adder to achieve minimum delay is presented in this paper. A fast carry look-ahead logic using group generate and group propagate functions is used to speed up the performance of multiple stages of ripple carry adders The group generates and group propagate functions are generated in parallel with the carry generation for each block. The optimum block sizes are decided by considering the critical path into account.

# Ramkumar, B., Kittur, H.M.: 'Low-power and area-efficient carry select adder', IEEE Trans. Very Large Scale Integr. Syst., 2012, 20, (2), pp. 371–375

The CSLA is used in many computational systems to alleviate the problem of carry propagation delay by independently generating multiple carries and then select a carry to generate the sum. However, the CSLA is not area efficient because it uses multiple pairs of Ripple Carry Adders (RCA) to generate partial sum and carry by considering carry input, then the final sum and carry are selected by the multiplexers (mux). The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA in the regular CSLA to achieve high speed and low power consumption.

# Amin, A.: 'Area-efficient high-speed carry chain', Electron. Lett., 2007, 43, (23), pp. 1258–1260

An improved carry chain circuit with carry-skip capability is described. The carry-skip logic allows an arbitrarily long carry chain without the need for intermediate buffers for signal restoration, leading to an implementation that is both fast and areaefficient. The chain can flexibly accommodate technology-imposed maximum depth of NMOS transistor pull-down stack.

### III. PROPOSED METHODOLOGY & IMPLEMENTATION

This is an interesting observation which can be used to minimize the critical path of the RCA design by performing the carry computation using complementary complex gates such as AND–OR–Invert (AOI) or OR–AND–Invert (OAI) gate in place of AO gate. the carry is expressed in the AOI form as

$$\overline{c_i} = \underbrace{\overline{g_i + p_i \cdot c_{i-1}}}_{\text{AOI}}$$

The ith carry signal computed is in complement form and cannot be directly used to compute

the (i + 1)th carry signal. Therefore, the expression of valid (i + 1)th carry signal using the ith carry ci is given as

$$c_{i+1} = \underbrace{\overline{g_{i+1}} \cdot (\overline{p_{i+1}} + \overline{c_i})}_{\text{OAL}}$$

The operation can be performed by using an OAI complementary gate which produces carry ci + 1 signal in the normal form. However, it is clear that the OAI-based carry computation requires pi + 1, gi + 1 and ci signals in their complemented form.



Fig 1: Complementary Gates

• Boolean expressions with AOI-based carry generation:

$$p_{i} = a_{i} \oplus b_{i}; \quad g_{i} = a_{i} \cdot b_{i}$$
$$\overline{c_{i}} = \overline{g_{i} + p_{i} \cdot c_{i-1}}$$
$$s_{i} = p_{i} \oplus c_{i-1}$$

• Boolean expressions with OAI-based carry generation:

$$\overline{p_i} = a_i \oplus b_i; \quad \overline{g_i} = a_i \cdot b_i$$
$$c_i = \overline{g_i} \cdot (\overline{p_i} + \overline{c_{i-1}})$$
$$s_i = \overline{p_i} \oplus \overline{c_{i-1}}$$

The two 1-bit logic cells (AOI-LC and OAI-LC) are derived using the proposed logic formulation given the AOI-LC takes inputs ai, bi and ci - 1, and computes sum (si) and intermediate carry-out ci signals while the OAI-LC computes sum (si) and carry-out (ci) signals using 1-bit input signals ai, bi and ci - 1. Using AOI-LC and OAI-LC modules, an m-bit RCA design is proposed as (for even value of m). In the proposed RCA design, the AOI-LC and OAI-LC modules are connected alternatively because AOI-LC generates carry-out in the complement form while the OAI-LC generates normal carry- out.

It takes m-bit inputs (a and b) and initial carry-in (cin) to compute the sum (s) and carry- out cout signals, where the output carry is equal to cm - 1. If the value of m is odd then the proposed RCA includes AOI-LC in the MSB bit position which generates the carry- out signal in the complement form. Therefore, an actual output carry is obtained by complementing the carry-out signal of the AOI-LC placed in the MSB position.



Fig 2: M- bit RCA

# Juni Khyat (UGC Care Group I Listed Journal)

# ISSN: 2278-4632 Vol-14 Issue-02 July 2024



Fig 3: Proposed Architecture of Binary Tree Adder

Proposed binary tree adder The proposed BTA structure for N = 8 and m = 4 is It consists of seven RCAs, where the first stage uses four 4- bit RCAs, the second stage uses two 5- bit RCAs and the third stage uses one 6-bit RCA. These RCAs are made from AOI-LC and OAI-LC modules. The proposed BTA structure is scalable for any values of N and m.

It can be seen from Fig. 1 that the first, second and third addition stages of the proposed BTA, respectively, introduced four gates (one XNOR, one OAI, one AOI and one XOR), three gates (one XNOR, one OAI and one XOR) and three gates (one AOI and two XOR) in the critical path which is shown with red dotted line.

For simplicity in delay estimation, the equal delay is considered for AOI and OAI gates, similarly equal delay for XOR and XNOR gates. Based on this consideration, the generalized delay expression is derived for the proposed BTA is given as

$$T_{\text{Prop-BTA}} = 2T_{\text{XOR}} + (m-2)T_{\text{AOI}} + (\log_2 N - 1)(T_{\text{AOI}} + 2T_{\text{XOR}})$$

Where  $T_{Prop-BTA}$  Represents the delay of the proposed BTA takes AOI/OAI gates in the critical path. Therefore, it is expected that the delay of the proposed BTA will be significantly less over the existing RCA based BTA

# **IV. IMPLEMENTATION RESULTS**



Fig 4: Proposed System RTL Schematic

| 0 👌 🛛 🐇 🕺 🐧 🐧                       | X 🖲 🛛 🕅               | AAII              | 0 3   | 8 0 9      | 18 11    | 8/8 :     | ** ***    | <b>[</b> ]) <b>j</b> | .00us 🖌 🐚 📗 | Re-launch |         |        |        |           |
|-------------------------------------|-----------------------|-------------------|-------|------------|----------|-----------|-----------|----------------------|-------------|-----------|---------|--------|--------|-----------|
| instances and Processes $\oplus$ $$ |                       | 000               | B     |            |          |           |           | -                    |             |           |         |        |        | 1,000.000 |
| ) (                                 | Simulation Object     | cts for testbench |       |            |          |           |           |                      |             |           |         |        |        |           |
|                                     |                       | 18 19 13          | P Nar | ne Va      | ue Ons   | 100 ns    | 200 ns    | 300 ns               | 400 ns      | 500 ns    | 600 ns  | 700 ns | 800 ns | 900 ns    |
| nstance and Process Name            |                       |                   | 8     | s(6:( 0111 | 111 (    | X 1000110 | X 0111001 | X                    |             |           | 0111111 |        |        |           |
| testbench                           | Object Name           | Value             | 0     | a 3x  0011 | 0000     | 0010      | 0001      |                      |             |           | 0011    |        |        |           |
| gibi g                              | ) 📲 s(6:0)            | 8111111           | 1     | b13:1 0110 | 0010     | 1100      | 0010      | X                    |             |           | 0110    |        |        |           |
|                                     | ) 💦 a[3:0]            | 8811              | - 6   | d3 ( 1081  | 1100     | 1101      | X 0100    | X                    |             |           | 1001    |        |        |           |
|                                     | þ 🔰 b[3:0]            | 8118              | 0     | di310 1100 | 1100     | 1110      | Ý 0110    | $\downarrow$ —       |             |           | 1100    |        |        |           |
|                                     | ) 💦 (3:0)             | 1901              | 0     | e 31 1111  |          | 1011      | 1000      | $\downarrow$ —       |             |           | 1111    |        |        |           |
|                                     | ▷ 💐 d[3:0] ▷ 💐 e[3:0] | 1190              |       | 13 0 1096  | 1010     | 1001      | 1010      | $\uparrow$           |             |           | 1000    |        |        |           |
|                                     | ) (3:0)               | 1909              | 2     | 01311 1000 | 1011     | 0001      | 1100      | ∲                    |             |           | 1000    |        |        |           |
|                                     | ) 💦 g[3:0]            | 1969              | 1     |            |          | 1000      | 1110      | ↓                    |             |           | 0010    |        |        |           |
|                                     | þ 📑 h(3:0)            | 8018              | *     | n(3:1 0816 | <u> </u> | 1000      | 1110      | A                    |             |           | 010     |        |        |           |

Fig 5: Simulation Result of Proposed Binary Tree Adder

# Juni Khyat (UGC Care Group I Listed Journal)

# ISSN: 2278-4632 Vol-14 Issue-02 July 2024

| Adder     | Bit width | Area | Delay | Less Area % | Less delay % |
|-----------|-----------|------|-------|-------------|--------------|
| RCA       | 8         | 304  | 0.318 | -           | -            |
|           | 16        | 608  | 0.586 | -           | -            |
|           | 32        | 1216 | 1.121 | -           | -            |
| Prop. RCA | 8         | 280  | 0.211 | 7.89        | 33.6         |
|           | 16        | 560  | 0.356 | 7.89        | 39.2         |
|           | 32        | 1120 | 0.646 | 7.89        | 42.3         |

### Table 1: Theoretical comparison for the proposed and existing RCAs

| Adder        | Bit width | Area | Delay | Less Area<br>% | Less delay % |
|--------------|-----------|------|-------|----------------|--------------|
| RCA          | 4         | 6    | 4.614 | 34.5           | 65.2         |
|              | 5         | 7    | 4.696 | 29.6           | 64.1         |
|              | 6         | 9    | 5.273 | 23.04          | 58.7         |
| Prop. System |           | 43   | 8.226 | 48.2           | 40.8         |
| Exe. System  |           | 33   | 7.395 | 62.8           | 45.4         |

 Table 2: Comparison of Synthesis Results

# V. CONCLUSION

The RCA-based BTA is widely used MOA due to its simplest structure that leads to area and energy efficient design. However, the long carry propagation path of RCA makes it poor in terms of delay performance. Therefore, in this paper, the delay analysis of RCA based BTA is presented. The comparison results show that the proposed RCA provides better efficiency in terms of area, delay and energy than the existing RCA. Using this RCA design, the BTA structure is proposed. We have also applied the proposed BTA in the recent multiplier designs to evaluate its performance. The synthesis result shows that the performance of multiplier designs improved significantly due to use of proposed BTA. Therefore, the proposed BTA design can be better choice to develop the area, delay and energy efficient digital systems for signal and image processing applications

# VI. FUTURE SCOPE

Any VLSI application includes certain actions such as area, power, delay and performance. Proposed design has an immense future scope by the inclusion of complementary gates to the existing system for better results and further detailed analysis. With the help of complementary gates, the vast structures that will be work largely from industrial side can be useful.

#### REFERENCES

- [1] Amin, A.: 'Area-efficient high-speed carry chain', Electron. Lett., 2007, 43, (23), pp. 1258–1260
- [2] Efstathiou, C., Owda, Z., Tsiatouhas, Y.: 'New high-speed multioutput carry look-ahead adders', IEEE Trans. Circuits
- Syst. II, Express Briefs, 2013, 60, (10), pp. 667-671
- [3] Kogge, P.M., Stone, H.S.: 'A parallel algorithm for the efficient solution of a general class of recurrence equations', IEEE Trans. Comput., 1973, 100, (8), pp. 786–793
- [4] Efstathiou, C., Vergos, H.T., Nikolos, D.: 'Fast parallel-prefix modulo 2 n + 1 adders', IEEE Trans. Comput., 2004, 53,
  (9), pp. 1211–1216
- [5] Hoe, D.H., Martinez, C., Vundavalli, S.J.: 'Design and characterization of parallel prefix adders using FPGAs'. 2011 IEEE 43rd Southeastern Symp. on System Theory, Auburn, AL,USA, 2011, pp. 168–172
- [6] Vergos, H.T., Dimitrakopoulos, G.: 'On modulo 2 n + 1 adder design', IEEE Trans. Comput., 2012, 61, (2), pp. 173–186
- [7] Didier, L.S., Jaulmes, L.: 'Fast modulo 2 n 1 and 2 n + 1 adder using carrychain on FPGA'. 2013 Asilomar Conf. on Signals, Systems and Computers, Pacific Grove, CA, USA, 2013, pp. 1155–1159
- [8] Daphni, S., Grace, K.V.: 'A review analysis of parallel prefix adders for better performance in VLSI applications. 2017 IEEE Int. Conf. on Circuits and Systems (ICCS), Thiruvananthapuram, India, 2017, pp. 103–106